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The so-called corner turning problem is a major bottleneck for radio telescopes with large numbers 
of antennas. The problem is essentially that of rapidly transposing a matrix that is too large to store 
on one single device; in radio interferometry, it occurs because data from each antenna needs to be 
routed to an array of processors that will each handle a limited portion of the data (a frequency range, 
say) but requires input from each antenna. We present a low-cost solution allowing the correlator to 
transpose its data in real time, without contending for bandwidth, via a butterfly network requiring 
neither additional RAM memory nor expensive general-purpose switching hardware. We discuss 
possible implementations of this using FPGA, CMOS, analog logic and optical technology, and 
conclude that the corner turner cost can be small even for upcoming massive radio arrays. 



I. INTRODUCTION 

There is now strong community interest in building 
more sensitive radio telescopes, stemming from diverse 
science opportunities that range from planets to pulsars, 
from black holes to cosmology [H-Q- However, greater 
sensitivity requires greater collecting area, which in turn 
increases cost. For large collecting area, interferometers 
become more cost-effective than single-dish radio tele- 
scopes, but pose interesting engineering challenges. 

One such challenge is that for a general large interfer- 
ometer, the cost grows quadratically with area, because 
for an array of N antennas, all N(N — 1)/2 °S N 2 pairs of 
antennas need to be correlated to calculate the so-called 
visibilities. The computational costs thus scale as iV 2 and 
dwarf all other costs for large enough N. Fortunately, 
there are attractive design approaches such as tiling, the 
MOFF-correlator Q and the omniscope [TJ E3 which 
enable more competitive cost scaling, some having costs 
that grow as slowly as N log N with size without losing 
any information |12j |. 

A second bottleneck, common to all the above- 
mentioned approaches, is what is known as the cor- 
ner turning problem. The problem is essentially that of 
rapidly transposing a matrix that is too large to store on 
one single device; in radio interferometry, it occurs be- 
cause data from each antenna needs to be routed to an 
array of processors that will each handle a limited por- 
tion of the data (a frequency range, say) but requires 
input from each antenna. Most often, the data from 
each of the N antenna signals can be filtered, ampli- 
fied, digitized and decomposed into channels separately 
from all other antenna signals. Suppose that the signal 
from each antenna is digitized and processed so as to 




FIG. 1: Toy example of how the N = 8 corner turning prob- 
lem could be solved using moving parts, with 8 devices simul- 
taneously transmitting data into the 8 red/dark grey links 
on the left and 8 other devices receiving data from the 8 
green/light grey links on the right. After 8 successive 45° 
rotations of the right wheel, all input devices have transmit- 
ted the required data to all output devices. 



produce a sample stream from each of M separate fre- 
quency channels. 1 If we imagine all this data arranged in 



1 For simplicity, we ignore the polarization issue in our discussion, 
since it can be trivially incorporated by simply doubling N and 
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an N x M matrix, then each row of the matrix thus re- 
sides on a separate physical device (typically an FPGA, 
a field-programmable gate array). However, the subse- 
quent computation of UV plane visibilities needs to com- 
bine information from all N antennas, separately for each 
frequency, i.e., each column of the matrix needs to be pro- 
cessed separately and rapidly. For a traditional interfer- 
ometer, this second stage involves multiplying the num- 
bers from the N(N — l)/2 pairs of numbers, the so-called 
"x-operation" , as distinguished from the preceding so- 
called / operation that Fourier transformed in the time 
domain. For an omniscope, this second stage involves 
performing a Fourier transform in two or more dimen- 
sions [l^ . In either case, the amount of computation of 
required for the second stage is at least proportional to 
N, so we need M > N channels to distribute the compu- 
tation. In either case, the corner turning is a major bot- 
tleneck, since transferring data between all N x M pairs 
of first-stage and second-stage devices requires moving 
the entire contents of the matrix across a large network. 
For example, for an TV = 64 x 64 dual polarization omnis- 
cope sampling at 400 MHz, the corner turner has to route 
about 13 terabytes per second. Once this bottleneck has 
been passed and the second stage has been completed, 
however, the resulting sky maps (or their Fourier trans- 
forms) can be time-averaged, dramatically reducing the 
data rate to manageable levels. 

Modern radio telescopes have typically adopted one of 
the following solutions to the corner turning problem: 

1. Writing the entire matrix to a single, extremely 
fast, giant memory module where it can be read 
out transposed, or using some other device with 
size 0(M 2 ), for example enough wires to turn the 
corner directly. 

2. Routing all the data through an off-the-shelf non- 
blocking switch. 

3. Using enough wires to make all the connections di- 
rectly. 

The first approach has been used by numerous experi- 
ments, and the second has been successfully implemented 
in the packetized CASPER correlator [13j,llJ] used by the 
PAPER experiment 15], where N — 32 (including polar- 
ization) is small enough to be handled by a single 10 GB 
Ethernet switch. The third is used in some very large 
correlators such as EVLA and ALMA. Unfortunately, all 
of these approaches become expensive for very large N, 
which makes it timely to explore alternative solutions. 

Another way of thinking about the corner turning 
problem is that it involves the only part of an interfer- 




FIG. 2: How to solve the N = 8 corner turning problem using 
a butterfly network, with 8 devices simultaneously transmit- 
ting data from the left and 8 other devices receiving data 
from the right. After the three control bits shown at the top 
have looped through all 8 combinations 000, 001, 010, 011, 
100, 101, 110 and 111, all input devices have transmitted the 
required data to all output devices. The boxes ("controlled 
swappers" ) have two input wires and either pass them straight 
through to their output wires or swap them, depending on 
whether the control bit (drawn as entering from above) is 
or 1, respectively. The 8 inputs are numbered in binary on 
the left hand side, and we see that the 1 st row of swappers 
can flip their 1 st address bit, the 2 nd row of swappers can flip 
their 2 nd bit, and the 3 rd row of swappers can flip their 3 rd 
bit. 



ometer that is not embarrassingly parallel 2 : it is easy to 
build many antennas, many A/D converters, many time- 
domain Fourier transformers and many correlators acting 
on separate frequency bands. The corner turn is the piece 
that transposes the data matrix to keep the processing 
embarrassingly parallel. 

The rest of this paper is organized as follows. In 
Section [Hi we present our solution to the corner turn- 
ing problem, which requires neither general-purpose net- 
work switches nor additional memory. We discuss vari- 
ous physical implementation options in Section IIIII and 
summarize our conclusions in Section Hvl 



II. THE BUTTERFLY ALGORITHM 

Below we will limit our discussion to the special case 
where M = N, i.e., where the matrix to be transposed is 
square, or, equivalently, where there are equal numbers 
of devices writing to and reading from the corner turner, 
since if one has an efficient corner turner for this case, 
then the general case becomes easy to solve. For exam- 



2 Computer scientists say that a problem is "embarrassingly par- 
treating each of the two polarization channels from each antenna allel" if it can be trivially distributed across a large number of 
as an independent data stream. processors that do not need to communicate with each other. 
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pie, one can simply pad the matrix with zeros, which 
is equivalent to inserting dummy data sources or sinks, 
or combine a few adjacent entries into one larger entry, 
splitting it at the end if necessary • We will present 
an optimized solution for the case where N is a power of 
2. 



A. The problem 

In the discussion below, we will refer to the concept of a 
"link," by which we mean some connection between two 
computers, FPGAs, or any other devices (nodes) that 
can carry data at a rate of N matrix entries each time a 
matrix is transposed. In most cases, this will simply be 
the rate at which the / stage of the correlator outputs 
data for a single antenna. In other words, the data rate 
on a given link is independent of the size of the interfer- 
ometer. If the data rate for a single channel exceeds that 
of the technology we use to build our corner turner (e.g.l 
Gbps if we used gigabit Ethernet connections) , then we 
will use bundles of identical connections as our links. 

It is easy to solve the corner turning problem with 
a non-blocking switch that can connect 2N links: each 
source node that starts with one row of the matrix simply 
transmits all of its data, with each entry in the matrix 
addressed to the sink node labeled with the column num- 
ber of that entry. Each node sends or receives at exactly 
the link rate, so a general-purpose nonblocking switch 
can handle all of the data. 

Large general-purpose high-speed switches are expen- 
sive because they are fully non-blocking, allowing any set 
of input devices to simultaneously transmit to any set of 
output devices. This is overkill for the corner turning 
problem, since we have complete prior knowledge of how 
data needs to be distributed. This suggests the possibil- 
ity of reducing cost by giving up complete generality. 

We need each of the N source nodes to transmit data 
through our corner turner to each of the N sink nodes, 
with each source node transmitting exactly a 1/N frac- 
tion of its total data rate to each sink node. This can 
be done with a very restricted kind of switch using no 
memory at all. Such a switch has N states, selected by a 
control input c 6 {0, . . . , N — 1}, where the source node 
labeled with a number i is connected to the sink node 
j = p(c, i) where the function p has the following prop- 
erties: 

1. For fixed i, all p(c,i) are unique and in the range 
V 1. 

2. For fixed c, all p(c, i) are unique and in the range 
V 1. 

In other words, the corner turner performs a different 
permutation of the inputs at each time step, such that 
after N steps, every input node has been connected to 
every output node exactly once. 

With such a switch, each source node i transmits the 
i,j entry of the matrix exactly when p(c, i) = j , and each 



sink node j will receive the entire column j, albeit in 
some arbitrary order. This means that each source node 
transmits data in a different order, but most receiver or 
/ stage designs should be able to handle this without 
difficulty. 

B. Our solution 

How should we choose the sequence of permutations pi 
There are clearly vast numbers of permutation sequences 
that satisfy the two requirements above, since we have 
AH choices even for p(0, j) alone. 

1. A mechanical solution 

One simple solution is that defined by the cyclic per- 
mutations 

p(c,i) = c + i mod N. (1) 

This choice is illustrated in Figure Q] for a toy example 
where where N — 8. If we connect the input devices 
to the metal bars protruding on the left side and the 
output devices to the bars protruding to the right, then 
the N successive 45° rotations of the right wheel will 
achieve a complete corner turn where every input device 
has transmitted to every output device. 

2. The butterfly algorithm 

In practice, one of course needs to accomplish all op- 
erations electronically without large moving parts. An 
elegant method for implementing precisely the cyclic per- 
mutations of Figure [1] electronically was discovered about 
a decade ago by Lynn Urry and implemented for the 
Allen Telescope Array @, Q , but this was unfortunately 
never published in a journal and did not become as widely 
known as it deserves to be. The other authors of this pa- 
per independently discovered the methods that we will 
describe below, which have the further advantage of be- 
ing even cheaper to implement. 

We schematically illustrate a simple solution in Fig- 
ure [U where the boxes ("controlled swappers") have two 
input wires and either pass them straight through to their 
output wires or swap them, depending on whether a con- 
trol bit (drawn as entering from above) is or 1, respec- 
tively. If the N = 8 inputs i are numbered in binary, 
then the 1 st row of swappers can flip their 1 st bit, the 
2 nd row of swappers can flip their 2 nd bit, and the 3 rd 
row of swappers can flip their 3 rd bit. This means that 
this corner turner implements the permutations 

p(c, i) = exori, (2) 

where the integers c, i and j are written in binary on the 
top, left and right sides of Figure [U respectively. 
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This basic network topology where a given node suc- 
cessively "talks" with nodes separated by 2°, 2 1 , 2 2 , etc. 
appears in a wide range of electrical engineering and soft- 
ware applications, including the Fast Fourier transform 
of N numbers, and is often referred to as a "butterfly 
network" . When the "talk" part is a swapper like in Fig- 
ure [21 the resulting network is a special case of a Banyan 
network [l6j — a type of general-purpose network switch 
which is nonblocking for certain permutations (as op- 
posed to fully nonblocking, which would allow nodes to 
talk to each other in any permutation.) A key point 
about the corner turner that we are proposing is that we 
are not using it as a general-purpose switch but rather 
with a specific algorithm: to cycle through N very par- 
ticular permutations, which is precisely what is needed 
to solve the problem at hand. 



An even cheaper corner turner using perfect shufflers 




FIG. 3: An equivalent network to Figure[2l but reorganized so 
that the connections after each row of swappers are identical. 
These connections, contained within the tall boxes, are seen 
to correspond to a "perfect shuffle", whereby the top half 
gets interleaved with the bottom half, and corresponds to 
cyclically shifting the address bits that label the inputs on 
the left-hand-side. 



For comparison, the method in [8, 9] also uses a Butter- 
fly network, but changes all N log N controlled swappers 
independently at each control step c instead of using the 
same setting for each of the log A columns. The latter 
implementation thus requires N times fewer control in- 
put wires, and we will see below how it can be further 
simplified to cut cost. 



The butterfly algorithm we have proposed requires 
that ./V be a power of 2. As we will see in Section Hm 
the cost of a butterfly corner turner is likely to consti- 
tute only a small fraction of the total cost of a large N 
radio array, so for a general number of antennas, a one 
can simply round N up to the nearest power of two for 
the corner turner. 



An obvious drawback to the configuration in Figure [2] 
is that the wiring layer between each stage is different, 
which complicates the manufacturing of a device to im- 
plement the network. It turns out that by a suitable 
permutation of the nodes after each row of swappers, 
one can simplify the wiring diagram so that all layers 
becomes identical, as illustrated in Figure El 

The required permutation performed between the 
wires after each row of swappers, contained within the 
tall boxes in the figure, turns out to be a so called "per- 
fect shuffle" permutation on TV elements, which draws its 
name from card shuffling. The perfect shuffle (or Faro 
shuffle) is defined as 



3 >-> 2j 

3 ^ 2.7 - N - 



if j < N/2, 
1 otherwise, 



(3) 



and corresponds to interleaving the top and bottom 
halves in the input (l7| . 

Figure [3] illustrates that if we write the input row j as a 
binary number composed of log 2 N bits, a perfect shuffle 
simply permutes its bits cyclically, shifting them all one 
notch to the left and moving the leftmost bit all the way 
to the right. log 2 N perfect shuffles thus restores j to its 
original value. Since the rows of swappers in Figure[3]can 
flip the rightmost bit (exchanging two neighboring rows), 
the net effect of the n th control bit from the left at the top 
of the figure is thus to control the n th bit from the right 
of j. In other words, Figure [3] corresponds to the same 
permutation sequence p(c,i) = cxori as Figure [2] except 
for the trivial modification that the control variable i has 
its bits in reverse order. 

To further reduce cost, we can omit the last perfect 
shuffle layer, since the resulting network is equivalent to 
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the butterfly network with its outputs permuted and re- 
tains all of the necessary properties. 

III. IMPLEMENTATION 
A. Layout 



6„_i& n _2 ■ • ■ bo can be circularly shifted one bit to the 
left by first circularly shifting the leftmost n — k bits 
one bit to the left, giving &„_ 2 . . . bk+\bkb n -\b k -\ ...b 
and then shifting the rightmost k + 1 bits one bit to the 
left, giving 6„_ 2 . . . b k +ib k b k -i ■ ■ ■ Mn-i as desired. (For 
example, n = 5 and k — 2 starts with 64&3&2&i^0j maps 
it to b-^bibibo, and finally maps it to &3&2^i^o&4-) 



Section |TT] described the basic layout of the butterfly 
network. We can solve the corner turning problem for 
an iV-element interferometer using an iV-link butterfly 
network. Since this technique is meant to scale to large 
radio telescopes, an ideal implementation of the butterfly 
network would be built out of large (but not too large) 
numbers of identical, inexpensive, and easy-to-connect 
parts. 

In the discussion below, we use n — log 2 N to refer to 
the number of layers in the network. 

1. Network cost 

If we built a butterfly network corner turner out of 
modular components, then the cost could be roughly 
computed by counting the number of each type of com- 
ponent. To build a very large network, or to build many 
smaller networks, it would be worth the extra effort to 
design the components so that they would be inexpensive 
to manufacture and so that as few as possible would be 
needed. 

The simplest set of components to use would be dis- 
crete 2x2 switches and single-link cables. Each con- 
trolled swapper layer is N/2 identical 2x2 switches ar- 
ranged in a column and each perfect shuffle layer con- 
sists of N cables running between pairs of computers 
or switches. For a hypothetical 2 20 = 104 8 5 76-node in- 
terferometer, this comes out to 20 layers, for a total of 
10,485,760 switches and 20,971,520 cables. This is doable 
(the interferometer would be expensive enough that this 
would most likely be only a small part of the cost), but 
connecting 20 layers of over one million cables without 
making mistakes would be tedious at best. 

2. How to further reduce the cost 

There is room for a large improvement in the number 
of parts needed, though: printed circuit boards contain- 
ing hundreds of components are inexpensive (in 2009, 12 
inch by 14 inch circuit boards can be fabricated for less 
than $30 each, even in small volumes, assuming that the 
circuit fits on two layers) and cables are available that 
can carry many links worth of bandwidth. (Of course, 
using large cables may only be useful internally — unless 
multiple source or sink nodes are on the same board, the 
input and output links must each be on its own cable.) 

To optimize the perfect shuffles, we will use a simple 
property of a cyclic bit shift: an n-bit binary number 



0000 , 0000 

0001 — ~ j — — ' 0001 

0010 ^VV^^^^T" 0010 

0011 ^ />c\ 0011 

0100 — — 0100 

0101 — "^oX — — ^ — 0101 

ono ^ 0110 

oin ^ /*xxXx 0111 

1000 ~~~--~y/yY^^^, .— - 1000 

1001 — ^ — — ' 1001 

1010 1010 

1011 ' 1011 
1100 — 1100 

noi — - — — 1101 

1110 -^/* — • 1110 

1111 ^ 1111 



FIG. 4: A 16-link perfect shuffler built out of four cables, 
each carrying four links, and two eight-link perfect shufflers, 
shaded blue. 

If the left sides of 2 n ~ k cables, each carrying 2 k links 
were lined up such that the first cable (cable 0) carried 
links 0, . . . , 2 fc -l, the second carried links 2 fc , . . . , 2-2 fc -l, 
etc, then the leftmost n — k bits of the link number could 
be circularly shifted one bit to the left by arranging the 
cables into a perfect shuffle. The rightmost k + 1 bits 
could be circularly shifted one bit to the left by connect- 
ing each pair of adjacent cables (i.e. all of the links sharing 
the leftmost n — k — 1 bits after the first circular shift) 
into a board that had two cable inputs and two cable 
outputs, with the individual links in the cables arranged 
into a perfect shuffle. This construction with n = 4 and 
k = 2 is shown in Figure |4] 

From these two components, using k — 6 (digital cables 
with 64 digital links are commercially available), each 
perfect shuffle layer in a 2 20 -link butterfly network would 
require 16,384 cables and 8192 two-cable perfect shuf- 
flers. This is a nice saving over using single-link cables, 
especially in terms of the labor needed to assemble the 
layers. 

By using larger circuit boards, many controlled swap- 
pers and many two-cable perfect shufflers could fit on a 
single circuit board, further cutting the number of parts. 

As an additional improvement, the layers in a butterfly 
network can be rearranged — rather than using n layers, 
each of which circularly shifts left by one bit and then 
xors the least significant bit, the network could instead 
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use n/l layers, each of which xors the least significant I 
bits with an £-\At control and then circularly shifts by 
£ bits (if n is not a multiple of £, then one layer would 
involve fewer bits). A device that toggles the least signif- 
icant £ bits of the address does not mix between blocks 
of 2 l bits, so a simple row of 2^-link £-hit togglers would 
make a 2™-link I bit toggler. This saves a factor of I in 
the number of togglers needed; £ perfect shuffle layers 
could go between each toggler layer. 

Putting this together for a 2 20 -link butterfly network, 
using k = 6 and £ — 8 gives 20 perfect shuffle layers for 
a total of 327,680 cables and 163,840 two-cable perfect 
shufflers and three controlled xor layers containing a total 
of 12,288 256-link 8-bit togglers (of which one of the three 
layers would only use half of its control inputs) . 

Some further improvements would be possible by 
building layers that circularly shifted by more than one 
bit at a time — these could still use multiple-link cables 
but would need larger, although still probably inexpen- 
sive, multiple-bit perfect shuffle boards. 



B. Technology 

So far, we have discussed layouts from an abstract 
point of view, without considering the particular tech- 
nology used. We will now briefly survey some attractive 
options for implementing this in practice. 



1. Off-the-shelf hardware 

The easiest design to prototype would use small (16- 
port, perhaps) nonblocking Ethernet switches and stan- 
dard Ethernet cables, with one link per cable. The 
switches are physically capable of performing any combi- 
nation of swapping and shuffling on their ports, but they 
arc not generally meant to change their connections on 
the fly. Most so-called managed switches can, however, 
store a small number (often 4096) of fixed routes indexed 
by the destination of the packet that they are switching 
(this is called the forwarding table). This means that, 
with some care, the destination field could be used to 
control the entire route taken by each packet travers- 
ing a butterfly network of Ethernet switches. This could 
be inexpensive (a few tens of dollars per switch port in 
2009 for one Gbps per link for a programmable managed 
switch and significantly more for 10 Gbps) but does not 
scale well beyond the square root of the size of the for- 
warding table, giving an upper bound of N < 64. More 
flexible switches and routers are available, but tend to be 
far more expensive and slower. 

FPGAs also allow rapid development. Most FPGAs 
have both standard I/O pins, where a high voltage held 
for one clock cycle indicates a 1 bit and a low voltage 
indicates a bit (these pins are limited to relatively low 
data rates on all but the most expensive FPGAs), and 



high-speed serial I/O pins, which can send or receive sev- 
eral gigabits per second on differential pairs of pins. Any 
FPGA can easily act as an arbitrary shuffler or swap- 
per, limited only by the numbers and types of I/O pins 
it has. We experimented with FPGA implementations 
and found that a 2 x 2 controlled swapper, even on a 
bottom-of-the-line Xilinx FPGA, could switch once per 
clock cycle, which was 64 million times per second on 
our device. The downsides of FPGAs arc their price and 
the fact that most FPGAs have relatively few high-speed 
I/O pins, keeping the cost per link quite high if data rates 
higher than one bit per clock per link are needed. 

2. Digital ASICs 

Custom application-specific integrated circuits 
(ASICs) can be fabricated in 2009 for a few hundred 
thousand dollars to make a set of masks plus very little 
for each part produced. Any technology that can trans- 
mit and receive high-speed digital data can also be used 
to switch it. For example, CMOS devices can switch 
at moderate speeds (several Gbps) and current-mode 
devices can operate in excess of 10 Gbps per differential 
pair. The number of links switched on a chip is limited 
only by the number of pins available on the chip, and a 
printed circuit board can hold as many of these chips as 
will fit at very low cost. 

In fact, with a signaling technology that can tolerate 
enough loss, shufflers could be build on circuit boards 
containing no chips at all, keeping costs even lower. 

3. Analog switching 

Many protocols for transmitting large data rates over 
copper cables use advanced modulation techniques. For 
example, gigabit Ethernet over CAT5 cabling uses multi- 
level signaling to achieve two gigabits per seconds (lGbps 
each way) over four wire pairs at low frequency. Devices 
to encode, decode, and error correct these kinds of proto- 
cols are complex and require significant power to operate, 
so it would be useful to minimize the number of times 
that data is modulated and demodulated in a butterfly 
network. If we used analog switches that could exchange 
two modulated signals with little enough loss, then we 
could have several layers of controlled swappers between 
each modulator/demodulator pair. 

4- Cable technologies 

Technologies to send large data rates over copper ca- 
ble are well established. Over short distances, a single 
conductor can carry one link. Over longer distances, dif- 
ferential signals are usually sent over one pair of conduc- 
tors per link, arranged into some form of transmission 
line, and different techniques can be used to modulate 
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the signal depending on the frequency response of the 
transmission line, the available transmission power, and 
cost considerations. Copper cables have the advantages 
of being relatively easy to construct, easy to connect, and 
they can interface easily with switching electronics. 

Optical fibers, on the other hand, can carry much 
higher data rates than copper over a single fiber, and 
all-optical switching technologies (photonic or mechani- 
cal) can rapidly switch these high data rates with little 
loss. Optical cables are inexpensive, but connecting them 
is labor-intensive and they are far more expensive to in- 
terface with electronics than copper. 

Finally, it is possible to transmit very high data rates 
between boards without any cables at all using free-space 
optical communication, in which boards have lasers and 
photodiodes aimed at each other. These laser beams can 
freely intersect each other, and devices that automati- 
cally aim and focus free-space optical links are available. 
See [HI for an example of a large network switch built out 
of free-space optical links. This technology could elimi- 
nate the need to hand-wire perfect shufflers altogether. 

IV. CONCLUSIONS 

The corner turner in a conventional interferometer 
ranges from moderately expensive (if the telescope is 



small enough to use an off-the-shelf Ethernet switch) to 
extremely expensive (if the number of antennas N is very 
large). We have presented a corner turning algorithm 
based on a butterfly network topology which can solve 
the corner turning problem for any N for an 0(N log N) 
hardware cost with a constant prefactor if inexpensive 
custom network parts are used. Even for an interferom- 
eter with over one million antennas, the corner turner 
could require well under one million network parts, each 
of which could cost only a few dollars in large volumes. 
By eliminating a key bottleneck, this bodes well for fu- 
ture large- TV radio telescopes. 
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